Identifying essential genes in genome-scale metabolic models of consensus molecular subtypes of colorectal cancer

Identifying essential targets in the genome-scale metabolic networks of cancer cells is a time-consuming process. The present study proposed a fuzzy hierarchical optimization framework for identifying essential genes, metabolites and reactions. On the basis of four objectives, the present study developed a framework for identifying essential targets that lead to cancer cell death and evaluating metabolic flux perturbations in normal cells that have been caused by cancer treatment. Through fuzzy set theory, a multiobjective optimization problem was converted into a trilevel maximizing decision-making (MDM) problem. We applied nested hybrid differential evolution to solve the trilevel MDM problem to identify essential targets in genome-scale metabolic models for five consensus molecular subtypes (CMSs) of colorectal cancer. We used various media to identify essential targets for each CMS and discovered that most targets affected all five CMSs and that some genes were CMS-specific. We obtained experimental data on the lethality of cancer cell lines from the DepMap database to validate the identified essential genes. The results reveal that most of the identified essential genes were compatible with the colorectal cancer cell lines obtained from DepMap and that these genes, with the exception of EBP, LSS, and SLC7A6, could generate a high level of cell death when knocked out. The identified essential genes were mostly involved in cholesterol biosynthesis, nucleotide metabolisms, and the glycerophospholipid biosynthetic pathway. The genes involved in the cholesterol biosynthetic pathway were also revealed to be determinable, if a cholesterol uptake reaction was not induced when the cells were in the culture medium. However, the genes involved in the cholesterol biosynthetic pathway became non-essential if such a reaction was induced. Furthermore, the essential gene CRLS1 was revealed as a medium-independent target for all CMSs.

where the stochiometric matrices, N CA and N HT , for CA and HT models are reconstructed using Step (A)-(C) in Figure 1. The RNA-seq expressions for CA and HT cells and GPR associations in Recon3D are used to set the weighting factors, and CA HT k k c c , for UFD problems; the four groups of confidence reactions are assigned as follows: / 1 , high confidence 4 1 , medium confidence 2 3 , negativec confidence 4 1, other confidence or non-gene-expression 3. We have to provide CA and HT templates ( Figure 1D and 1E) for the ACTD framework for identifying anticancer targets. Clinical data of the fluxes and metabolite flow can be used as the CA and HT templates. However, genome-scale clinical data are currently not available. We use Eq.(S1) to compute optimal fluxes and metabolite flow rates for CA and HT cells to provide as the templates. The computational procedures are expressed in Fig. S2  Figure S2. Computational procedures to obtain the optimal fluxes and metabolite flow rates provided as CA and HT templates.
4. The ACTD problem is transformed into a maximizing decision-making (MDM) problem through fuzzy set theory as follows.
The maximum objective is obtained by solving the FBA problem, and to provide as the constraint value in the UFD problem: The optimal fluxes are obtained by solving the UFD problem, and to provide as the flux template: is a jth forward ( ) and backward ( ) flux for the CA and HT model.  The optimal fluxes and metaboite flow rates are provided as CA and HT templates for the ACTD problem: CA template: and , 1,..., and , 1,..., HT template: and , 1,..., and , 1,..., We apply the nested hybrid differential evolution (NHDE) algorithm to solve the MDM problem that mimics a wet-lab to identify anticancer targets as shown in Figure 1D to 1 M. The key iterative procedures of the NHDE algorithm are mutation, crossover and fitness evaluation (Table S1), that are introduced in the below section. Here, we explain how to compute a decision-making criterion (referred to as fitness in the NHDE algorithm) for each generated gene.
6. The NHDE algorithm is a parallel direct search algorithm that generates a population of target genes, and then each target is applied to find its corresponding optimal solution. A gene can regulate one or several reactions to restrict the lower and upper bounds of fluxes as in Eq.(S1) to express as follows.
Treated CA (TR) model: FBA problem: UFD problem: Perturbed HT (PH) model: FBA problem: UFD problem: where is the gene to be deleted 7. Using the similar procedures in Fig. S2, the optimal fluxes and metabolite flow rates for TR and PH model of the i th gene are obtained as follows.
The optimal fluxes and metaboite flow rates for TR and PH model of the ith gene are applied to evaluate membership functions in the MDM problem: TR model: and , 1,..., and , 1 8. The optimal solutions for all target genes obtained from Eq.(S5) are applied to compute membership function grades for fuzzy minimization, maximization, dissimilarity and similarity to yield the cell mortality grade ηTR, cell viability grade ηCV, and metabolic deviation grade ηMD as follows. Figure S3. One-side member function to attribute fuzzy minimization and maximization The cell mortality grade ηTR for treated CA model and cell viability grade ηCV for perturbed HT model are evaluated by the mean-min operation, and expressed as follows.  Figure S4. Two-side membership function to attribute fuzzy dissimilarity and similarity.
The metabolic deviation grade is defined by the mean-min calculation as follows.
The cell mortality, cell viability and metabolic deviation grades for each target are applied to evaluate the decision-making criterion ηD = (ηTR + min{ηTR, ηCV, ηMD})/2 in Eq.(S3). The criteria for all targets are then used in the selection and evaluation step of NHDE (Step 5 in Table S2) to reproduce the next better individuals. The NHDE algorithm carries on iterative crossover, mutation procedures as discussed in the next subsection.

Introduction to Nested Hybrid Differential Evolution (NHDE)
The anticancer target discovery (ACTD) platform can be formulated as a fuzzy multi-objective hierarchical optimization problem. The ACTD platform can be transformed into a maximizing decision-making (MDM) problem by using fuzzy set theory to derive Pareto solutions as shown in Figure S1. The existence and limitation of the transformation have proved in Wang, et al. (2021). Figure S5. Description of the mathematical strategy for solving a fuzzy multiobjective hierarchical optimization problem.
The MDM problem is rewritten as the following simplified formulation for easily explaining the NHDE algorithm. The mutation operator of NHDE adopted from DE was an essential component compared with other evolutionary algorithms. Different from conventional evolutionary algorithms, the mutation operation of DE/NHDE uses the difference between two or four randomly chosen individuals as an evolutionary direction. The i th mutant individual (z G )i in generation G is obtained through the difference of two or four random individuals as expressed in the following form: indicates the i th mutant individual in the previous generation. The mutation operation may cause the mutant individual escape from the search domain. The mutation operation may cause the mutant individual to escape the search domain (i.e., bounds are violated). If this occurs, it is replaced by a random number within the lower and upper bounds of the particular decision variable, thus restricting to the search domain. The choice of mutation factor for DE/NHDE is heuristic and random. When population diversity is low, candidate individuals rapidly cluster together such that the individuals cannot be further improved, and premature convergence occurs. Similar to conventional evolutionary algorithms, the local population diversity could be increased by using a crossover operation such as a binomial crossover.
NHDE use the difference between two or four mutually independent individuals to determine the direction of search and obtain a mutant individual. This differential mutation converges quickly so that most individuals cluster around the best candidate individual in some generations. Consequently, the population diversity and exploration capability diminish and clustered individuals are unable to reproduce more diversified individuals through the mutation operation because the weighted difference is nearly zero. The recombination of mutant individuals and their clustered parents further prevents the reproduction of a diversified population. Therefore, all individuals quickly cluster together and superior individuals cannot be generated through mutation and crossover operations.
The migration operation of the NHDE algorithm is used to help individuals escape from the local cluster, but this operation is performed only if the population diversity falls below a desired level. The degree of population diversity is introduced to check whether the migration operation should be performed. Each element of the i th individual (z G )i in generation G is referred to as a gene of the individual, and the gene diversity index dzji is given by 0, if , 1,..., ; 1,..., ; 1, otherwise, where zji G and zjb G are the j th gene of the ith and best individual at the G th generation, respectively. dzji is set to zero if the j th gene of the i th individual is identical to the best gene; otherwise it is set to one (Chiou and Wang, 1999;Liao, et al., 2001). is defined as the ratio of total gene diversities to the total number of genes other than those of the best individual:

 
The value of population diversity degree ranges between zero and one. A value of zero implies that all of the genes are clustered around the best individual. On the other hand, a value of one indicates that current candidate individuals are a completely diversified population. The desired tolerance for population diversity is assigned by the user. A tolerance value of zero implies that the migration operation in NHDE is switched off, and one implies that the migration operation is performed at every generation. Consequently, the user can set a tolerance value for population diversity degree, (0, 1). If is smaller than , then NHDE performs migration operations to regenerate a new population in order to escape from a local point; otherwise, NHDE suspends the migration operation and maintains a constant search direction toward finding a new solution. Table S2. The NHDE algorithm for iteratively selecting a set of candidate enzymes and to infer optimal oncogenes. Figure S6. Flowchart of the parallel search algorithm in NHDE